B asins of Attraction in a Perceptron - like Neural N et wor k Werner
نویسنده
چکیده
We st udy the perfor mance of a neural network of the percept ron typ e. We isolate two important set s of pa rameters which render t he network fault tolerant (existence of large basins of attraction) in both hetero-associat ive and auto-associative systems and study t he size of the bas ins of attraction (the maximal allowable noise level st ill ensuring recognition ) for sets of random patterns. The relevance of ou r result s to the pe rcept ron's ability to gene ralize are pointed out , as is t he role of diagonal couplings in the fully connected Hopfield model. 1. I ntroduction An important asp ect of the physicists' approach to the study of neur al net works has been to concentrate on some standard situat ions which can be described as probabi lity distribut ions of inst ances. For these one can then obtain quant itati ve comparison of the performances of different networks for large numbers of neurons and connections. A typical exam ple is Hopfield 's model [IJ of associative memo ry. In order to quant ify its per formanc e, it has been calculate d how many independent randomly chosen patterns can be stored with such an architecture, in the "thermodynamic limit" where the num ber N of neurons is large. For unbiased patterns the original Hebb rule allows to store O.I4N patterns [2,3} , and more sophist icated , but st ill perceptron-type, ru les [3-5) can reach the upp er storage limit 17,8) of 2N pat terns. While Hopfield 's model and variants of it have been studied thoroughly {rom a statist ical physics point of view (for recent reviews see [9,IOJ) ,other widely used models such as layered networks [11] have not been analyzed in th is way so far . • Labo ratoire Propre du Centre National de la Recherche Scientifique, associe a l'E cole Normale Superieure et a l'U niversite de Pari s Sud. © 1988 Complex Systems Publicat ions, Inc. 388 Werner Krauth, Marc Mezard, and Jean-Pierre Nadal In this pap er we shal l deal with the simplest such network , namely the percept ron, which consists of two layers (t he usual descript ion of a perceptron [12] contains an initi al layer which insur es some frozen precoding; in this paper we will not consider this first stage). In particular, we study it s associative propert ies, which arc interesting, even though the limitat ions of the perce pt ron are well known [13]. A recent review of previous studies of associat ive proper ties in other two layers network s can be found in [14]. Associat ivity is an important feature of neural networks as it allows for th e correction of errors: even noisy input configurations can be mapped close to the desired output in the sense of Hamming distance. Thi s is a linearly separable problem, and therefore it can be solved by a perceptron, in contrast to , e.g., t he parity and the connectivity problems, which fall into a different class of comput at ional problems, where the correlat ions be tween input configurat ions are not na turally related to the Hamming distance, and where the definit ion of noise would not be ap propriate. Hereafter we shall study the storage capacity of the perceptron, concentr ating on the size of the basins of at t ract ion. Th e basic result is t hat the size of the basin of attract ion of a pat tern depends primarily on its stability. (Th e precise definition of "stability" is given in t he next sect ion. For the pat tern to be recognizable by the network in the absence of noise, its stability has to be positive.) For independent random patterns (which may be biased or not) we then calculate the ty pical stabilities of th e pa t terns achieved by two learning rules, the pseudoinverse rule [1 5,24] and the minim al overlap rule (6) which can reach opti mal stability. Besides fully determining the associat ive power , knowledge about the stability achieved in the network gives us information about its capaci ty ; an interesting outco me of our analysis is that the optimal capacity (defined as the rat io of the number of stored patterns to the numb er of neuron s in the input layer) tends to infinity when all the out put patterns coincide provided t he input pat terns are correlated. This result can be interpreted as reflect ing the perceptron' s ability to generalize: it is able to infer a simple rule from a large enough number of examples. When st udy ing the auto -associat ion in a perceptron (mapping the patterns ~ and their nearby configurat ions onto themselves) we shall see that a second parameter becomes important in order to obtain large basins of attract ion: the values of t he diagonal elements in the matrix of couplings, which link the neurons to t hemselves and tend to freeze the configurations. As the problem of auto-association can be regarded as one single par allel update of a Hopfield network , we then emp hasize the relevance of these results to the fully connected Hopfield mode L \Ve show by numerical simulations that t he stability and the st rength of t he diagona l couplings are indeed two important parameters for the dyn amics of the Hopfield net. T here exists an optimal value of the diagonal couplings which maximizes the radi us of the basins of attracti on. The evolving simple picture the stability of a percept ron governs its stat ic proper ties (the storage capacity) as well as its dynamics (associat ivity) Basins of Attraction in a Percep tron-Iike Neural Network 389 becomes considerably more complicated as soon as one allows several iterations of the perceptron's mapping. The correlations of the synaptic strengths start to play an important role, especially the degree of sym met ry of t he matrix, and it is no longer possible to make as general st atements as for t he perceptron. Th ese quest ions have been stressed in another art icle [16] which is complementary to the present one. Related work on the role of the st ability can be found in [17,18J. The plan of this article is as follows: In section 2 we define the network , its dynam ics, the notion of at t ract ion basins and the probability dist ribution of the pat terns to be used for quanti tati ve analysis. In sect ion 3 we compute the quality of retri eval for a noisy input for two genera l classes of coupling matrices. Secti on 4 contains a detailed comparison of the associat ive propert ies of two specific learning rules: the pseudoinverse and the minimum overlap rules. In section 5 the relevance of the resul ts to au to-association in fully connected networks is discussed. Sect ion 6 shows how some of the result s can be interpreted as the ability of generalization of t he percept ron. Lastl y some concluding remarks are given in section 7. 2. D ynamics of a t wo-layer network We st udy a network of the perceptron type which consists of two layers of neurons. The neurons are Boolean un its which we write as (Ising-) spins taking values ±1 . The input layer consists of Nspins q= { U j =±1,j= 1"",N} and the output layer contains N' spins q' = {ui = ±1,i = I , . . . ,N'} . We shall concent rate on the limiting case where the numbers of neurons Nand N ' bot h go to infi nity. T he coupling (synapse) between neuron CTj of the inpu t layer and the neuron O'i of the output layer is denoted by Jij so that the coupling matrix (J jj ) is of size (N' x N ). T he output corresponding to a given input configurat ion is given by a (zero-)t hreshold automaton rule ui = Sign ( L: J ijUi ) , j = l,N i = 1 " , ., N' (2.1) The network is taught (through the determination of the Ji j ) to map each of the p = o N input patterns f" = {~i = ± I ,j = I , ... , N} onto a certain out put pattern C'" = W" = ±I, i + I , . . . , N} . We shal l disti nguish between two different cases: hetero-associat ion, in which input and output patterns differ and auto-association; in which they are ident ical. In the lat ter case we have N ' = N , and the coupling matrix is square. In this case a special role will be played by the diagonal coupling matrix elements Jj j which connect corresponding neurons (i) on the input and on the output layer. Whenever we need to specialize to a specific distribut ion of patterns (mostly in sect ion 4), we shall consider the case where the patterns are chosen randomly following t he prescript ion 390 Werner Krauth, Marc Mezard, and Jean-Pierre Nadal (" _ { +l wit h probability (1 +m)/2 . _ 1 N , -1 with probability (1 m)/2 J , ... , (2.2) The probabilities are adjusted so that the patterns carry a mean magnet ization m == liN L:j ~j (the parameter m is related to th e act ivity of the neuron). In t he case of hetero-association the output patterns are similarly chosen randomly with magnetization m'. T his type of bias an d its generali zation to more st ructured hierarchically corre lated patterns has been studied in the case of the Hopfield model [19-21J. For associativity we need that configurat ions close to ~ also be mapped close to ['p.. To give this notion a precise meaning we shall suppose that the input configuration a is chosen randomly, but with a fixed overlap q:
منابع مشابه
High Performance Associative Memory and Weight Dilution
The consequences of diluting the weights of the standard Hopfield architecture associative memory model, trained using perceptron like learning rules, is examined. A proportion of the weights of the network are removed; this can be done in a symmetric and asymmetric way and both methods are investigated. This paper reports experimental investigations into the consequences of dilution in terms o...
متن کاملLearning of associative memory networks based upon cone-like domains of attraction
A learn ing a lgor i thm for single l a y e r perceptrons i s proposed. F i rs t , cone-l ike d o m a i n s , each of w h i c h i s mapped b y t h e perceptron n e t w o r k i n t o a lmos t a n associat ive pat tern, are derived. T h e l earn ing a lgor i thm i s obtained a s a process t h a t enlarges t h e cone-l ike d o m a i n s . F o r autoassociat ive ne tworks , it i s s h o w n t h a t...
متن کاملANNAM An Artificial Neural Net Attraction Model to Analyze Market Shares
The marketing literature so far only considers attraction models with strict functional forms. Greater exibility can be achieved by the neural net based approach introduced which assesses brands' attraction values by means of a perceptron with one hidden layer. Using log-ratio transformed market shares as dependent variables stochastic gradient descent followed by a quasiNewton method estimates...
متن کاملA TS Fuzzy Model Derived from a Typical Multi-Layer Perceptron
In this paper, we introduce a Takagi-Sugeno (TS) fuzzy model which is derived from a typical Multi-Layer Perceptron Neural Network (MLP NN). At first, it is shown that the considered MLP NN can be interpreted as a variety of TS fuzzy model. It is discussed that the utilized Membership Function (MF) in such TS fuzzy model, despite its flexible structure, has some major restrictions. After modify...
متن کاملAn Artificial Neural Net Attraction Model (ANNAM) to Analyze Market Share Effects of Marketing Instruments
Attraction models are very popular in marketing research for studying the e ects of marketing instruments on market shares. However, so far the marketing literature only considers attraction models with certain functional forms that exclude threshold or saturation e ects on attraction values. We can achieve greater exibility by using the neural net based approach introduced here. This approach ...
متن کامل